Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules

نویسندگان

  • Rafael Gómez-Bombarelli
  • David K. Duvenaud
  • José Miguel Hernández-Lobato
  • Jorge Aguilera-Iparraguirre
  • Timothy D. Hirzel
  • Ryan P. Adams
  • Alán Aspuru-Guzik
چکیده

We report a method to convert discrete representations of molecules to and from a multidimensional continuous representation. This model allows us to generate new molecules for efficient exploration and optimization through open-ended spaces of chemical compounds. A deep neural network was trained on hundreds of thousands of existing chemical structures to construct three coupled functions: an encoder, a decoder, and a predictor. The encoder converts the discrete representation of a molecule into a real-valued continuous vector, and the decoder converts these continuous vectors back to discrete molecular representations. The predictor estimates chemical properties from the latent continuous vector representation of the molecule. Continuous representations of molecules allow us to automatically generate novel chemical structures by performing simple operations in the latent space, such as decoding random vectors, perturbing known chemical structures, or interpolating between molecules. Continuous representations also allow the use of powerful gradient-based optimization to efficiently guide the search for optimized functional compounds. We demonstrate our method in the domain of drug-like molecules and also in a set of molecules with fewer that nine heavy atoms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Chemical Design using Variational Autoencoders

We train a variational autoencoder to convert discrete representations of molecules to and from a multidimensional continuous representation. This continuous representation allow us to automatically generate novel chemical structures by performing simple operations in the latent space, such as decoding random vectors, perturbing known chemical structures, or interpolating between molecules. Con...

متن کامل

A New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining

Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...

متن کامل

Effect of one Bout Continuous Versus Intermittent Aerobic Exercise on Plasma Levels of Intercellular Adhesion Molecules 1 and Vascular Cell Adhesion Molecules 1 in Patients with Coronary Heart Disease

Introduction: Adhesion molecules play an important role in the pathogenesis of atherosclerosis and the type of training may affect the response to these indicators. Therefore, the purpose of the present study was to investigate the effect of a continuous versus interval aerobic training session on plasma levels of intercellular adhesion molecules 1 (ICAM-1) and vascular cell adhesion molecules ...

متن کامل

Modeling of Continuous Systems Using Modified Petri Nets

Due to the changes which may occur in their parameters, systems are usually demonstrated by some subsystems for different conditions. This paper employs Modified Petri Nets (MPN) to model theses subsystems and makes it simple to analyze them. In this method, first, the continuous transfer function is converted to a discrete transfer function and then, by MPN, system is modeled and analyzed. All...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2018